Inference for Multiple Heterogeneous Networks with a Common Invariant Subspace

The development of models and methodology for the analysis of data from multiple heterogeneous networks is of importance both in statistical network theory and across a wide spectrum of application domains. Although single-graph analysis is well-studied, multiple graph inference is largely unexplored, in part because of the challenges inherent in appropriately modeling graph differences and yet retaining sufficient model simplicity to render estimation feasible. This paper addresses exactly this gap, by introducing a new model, the common subspace independent-edge multiple random graph model, which describes a heterogeneous collection of networks with a shared latent structure on the vertices but potentially different connectivity patterns for each graph. The model encompasses many popular network representations, including the stochastic blockmodel. The model is both flexible enough to meaningfully account for important graph differences, and tractable enough to allow for accurate inference in multiple networks. In particular, a joint spectral embedding of adjacency matrices—the multiple adjacency spectral embedding—leads to simultaneous consistent estimation of underlying parameters for each graph. Under mild additional assumptions, the estimates satisfy asymptotic normality and yield improvements for graph eigenvalue estimation. In both simulated and real data, the model and the embedding can be deployed for a number of subsequent network inference tasks, including dimensionality reduction, classification, hypothesis testing, and community detection. Specifically, when the embedding is applied to a data set of connectomes constructed through diffusion magnetic resonance imaging, the result is an accurate classification of brain scans by human subject and a meaningful determination of heterogeneity across scans of different individuals.


Introduction
Random graph inference has witnessed extraordinary growth over the last several decades (Abbe, 2017;Goldenberg et al., 2009;Kolaczyk, 2017).To date, the majority of work has focused primarily on inference for a single random graph, leaving largely unaddressed the critical problem of modeling the structure of multiple-graph data, including multi-layered and time-varying graphs (Boccaletti et al., 2014;Holme and Saramäki, 2012;Kivelä et al., 2014).Such multiple graph data arises naturally in a wide swath of application domains, including neuroscience (Bullmore and Sporns, 2009), biology (Han et al., 2004;Li et al., 2011), and the social sciences (Mucha et al., 2010;Szell et al., 2010).
Several existing models for multiple-graph data require strong assumptions that limit their flexibility (Holland et al., 1983;Levin et al., 2017;Nielsen and Witten, 2018;Wang et al., 2017) or scalability to the size of real-world networks (Durante and Dunson, 2018;Durante et al., 2017).Indeed, principled approaches to multiple-graph inference (Arroyo Relión et al., 2019;Ginestet et al., 2017;Kim and Levina, 2019;Levin et al., 2017;Tang et al., 2017a,b) are rather scarce, in part precisely because of the challenges inherent in constructing a multiple-graph model that adequately captures real-world graph heterogeneity while remaining analytically tractable.The aim, here, is to address exactly this gap.
In this paper, we resolve the following questions: first, can we construct a simple, flexible multiple random network in networks that can be used to approximate real-world data?Second, for such a model, can we leverage common network structure for accurate, scalable estimation of model parameters, while also being able to describe distributional properties of our estimates?Third, how can we use our estimators in subsequent inference tasks, such as multiple-network testing, community detection, or graph classification?Fourth, how well do such modeling, estimation, and testing procedures perform on simulated and real data compared to state-of-the-art methods for such tasks?
Each of these questions is of theoretical and practical significance in its own right.We address all of them, and our unified framework allows for a consistent and comprehensive approach to multiple-network inference.To begin, we present a semiparametric model for multiple graphs with aligned vertices that is based on a common subspace structure between their expected adjacency matrices, but with allowance for heterogeneity within and across the graphs.The common subspace structure has a meaningful interpretation and generalizes several existing models for multiple networks; moreover, the estimation of such a common subspace in a set of networks is an inherent part of well-known graph inference problems such as community detection, graph classification, or eigenvalue estimation (Lyzinski et al., 2017).
Our model describes a class of independent-edge random graphs G (1) , • • • , G (m) whose Bernoulli adjacency matrices A (i) , 1 ≤ i ≤ m, have expectation of the form V R (i) V T , where R (i) is a low-dimensional matrix and V is a matrix of orthonormal columns.Because the invariant subspaces defined by V are common to all of the graphs, we call this model the common-subspace independent edge (COSIE) random graph model.The R (i)  matrices are dimension d × d, need not be diagonal, and can vary with each graph.In particular, despite the shared expectation matrix factor V , each graph G (i) can have a different distribution.As we show in Prop. 1, our model setting includes stochastic blockmodels (SBMs) with common block assignments but different block connection probability matrices B (i) .
Next, we explore estimation of model parameters in COSIE, specifically the common invariant subspace V and the individual matrices R (i) .In Theorem 3, we leverage techniques from Fan et al. (2017) to prove that a joint spectral embedding of the adjacency matrices A (1) , • • • , A (m) leads to an accurate estimate of V , up to an orthogonal transformation.We call this joint spectral embedding procedure the multiple adjacency spectral embedding (i.e., MASE), and we denote by V the estimate for V generated by the MASE procedure.In the special case when all the graphs have a shared community structure in the stochastic blockmodel, Theorem 5 shows that an additional clustering step applied to the rows of V can be employed to accurately recover vertex community labels.In Theorem 7, we show that V can also be used to estimate the matrices R (i) accurately.Under delocalization assumptions on V , which are satisfied in, for example, stochastic blockmodel graphs, we prove that the estimates R(i) have entries that are asymptotically normal with a constant variance and with a bounded bias that decays as the number of graphs m increases.
Subsequently, we compare estimation and testing of MASE to other state-of-the-art procedures for latent space modeling of multiple graphs, notably a 1) joint embedding first described in Wang et al. (2017); 2) a model for multiple random dot product graphs in Nielsen and Witten (2018); 3) the omnibus embedding of Levin et al. (2017) in simulated stochastic blockmodels.We show that for certain graphs arising from the COSIE model, MASE empirically outperforms all of these methods for the estimation of block probability matrices, community detection, and graph classification.We also demonstrate that two-sample test statistics derived from MASE estimators outperform the omnibus embedding for graph hypothesis testing.Specifically, when the inference task is to detect whether two stochastic blockmodels have the same block probability matrices B (i) , the MASE test statistic provides a nontrivial improvement in power.We illustrate such improvement in statistical power in a challenging setting where the block probability matrices between null and alternative differ only slightly.
Next, we use COSIE to model a collection of brain networks, specifically data from the HNU1 study (Zuo et al., 2014) consisting of diffusion magnetic resonance imaging (dMRI) of 30 different healthy human subjects.Each subject was scanned ten times over a one-month period, resulting in 300 scans.Using NeuroData's NDMG pipeline Kiar et al. (2018) and a registration to the CC200 atlas of Craddock et al. (2012), we construct a collection of 200-vertex graphs.We apply the MASE procedure to this collection of graphs, obtaining low-dimensional representations of each graph, and then generate a dissimilarity matrix of pairwise distances between such representations.Performing classical multi-dimensional scaling (CMDS) (Borg and Groenen, 2003) on this dissimilarity results in a two-dimensional plot of these graphs, with clear separation between different subjects.Thus the composition of CMDS with the MASE procedure elucidates differences across heterogeneous networks but also recognizes important similarities, and provides a rigorous statistical framework within which biologically relevant differences can be assessed.
The paper is organized as follows.In Section 2, the goal is to briefly and succinctly encapsulate the principal contributions of this paper.As such, in this section, we introduce the COSIE model and provide basic context, and we describe spectral graph inference and the MASE procedure.Thereafter, we present our main theoretical results on consistency and asymptotic normality for the MASE procedure.We then move on to describe the use of MASE on simulated and real data, with which we conclude Section 2. In Section 3 and Section 4, we give a formal treatment of the COSIE model and MASE procedure.In Section 5, we study the theoretical statistical performance of MASE and provide a bound on the error of estimating the common subspace, and we establish asymptotic normality of the estimates of the individual graph parameters.In Section 6, we investigate the empirical performance of MASE for estimation in testing when compared to other benchmark procedures.In Section 7, we use MASE to analyze connectomic networks of human brain scans.Specifically, we show the ability of our model and estimation procedure to characterize and discern the differences in brain connectivity across subjects.In Section 8, we conclude with a discussion of open problems for future work.

Summary of Key Results
We begin this section by summarizing our multiple random graph model, keeping in mind the overarching goal of establishing a tractable, flexible model for graph heterogeneity.Consider a sample of m observed graphs G (1) , . . ., G (m) , with G (i) = (V, E (i) ), where V = {1, . . ., n} denotes a set of n labeled vertices, and E (i) ⊂ V ×V is the set of edges corresponding to graph i. Assume the vertex sets are the same (or at least aligned).Assume the graphs are undirected, with no self-loops (it is worth emphasizing however, that these results can be easily extended to allow for loops, directed or weighted graphs).For each graph G (i) , denote by A (i) to the n × n adjacency matrix that represents the edges; the matrix A (i) is binary, symmetric and hollow, and i) .Assume that the above-diagonal entries A (i)  uv , v > u, of the adjacency matrix for graph i are independent Bernoulli random variables with P uv the probability of an edge between vertex u and vertex v. Consolidate these probabilities in the matrix P .
In defining a model for multiple graphs, we adopt a low-rank assumption on the expected adjacency matrices.To leverage the information content across multiple graphs, we further assume common structure in that the expected adjacency matrices of the independent edge graphs share a common invariant subspace.We call this the common subspace independent edge random graph (COSIE).Specifically, we say that a collection of m independent random graphs G (i) : 1 ≤ i ≤ m are distributed according to the bounded rank d COSIE graph model with common subspace V and score matrices R (i) ∈ R d×d , if each of the connection probability matrices P (i) for G (i) is given by Under certain conditions on the rank of the score matrices, the invariant subspace of the model V is identifiable up to an orthogonal transformation (see Proposition 2).
Letting A (i) denote the adjacency matrix for graph G (i) , we write m) .
In the COSIE model, there are three primary estimation goals.First, we want to obtain an accurate estimate V of the common subspace V of the unobserved probability matrices P (i) .Second, we want to estimate the individual matrices R (i) .
COSIE also provides the foundations for the critical (and wide-open) question of multi-sample graph hypothesis testing, whereby we can use these estimates for V and R (i) to build principled tests to compare different populations of networks.

COSIE is a flexible and estimable multiple graph model
The COSIE model is appropriate for modeling a nontrivial swath of real network phenomena.Indeed, the ubiquitous SBM is COSIE, and COSIE encompasses the important case of a collection of vertex-aligned SBMs (Holland et al., 1983) whose block assignments are the same but whose block probability matrices may differ, as we show in Proposition 1. COSIE also provides a generalization to the multiple-graph setting of latent position models such as the random dot product graph (Young and Scheinerman, 2007).Now, to estimate V and R (i) , we rely on the low-rank structure of the expectation matrices in COSIE.Specifically, we first spectrally decompose each of the adjacency matrices A (i) to obtain the adjacency spectral embedding (Sussman et al., 2012) , defined as where D(i) is the d × d diagonal matrix of the top d eigenvectors of A (i) , sorted by magnitude, and the n × d matrix of associated orthonormal eigenvectors.
In the COSIE model, we will leverage the common subspace structure and use these eigenvector estimates to obtain an improved estimate of the true common subspace V .In fact, we use to build the multiple adjacency spectral embedding (MASE), as follows.We define the n × md matrix Û by and we let V be the matrix of the top d leading left singular vectors of Û .We set The MASE algorithm, then, outputs V and R(i) .Figure 1 provides a graphical illustration of the MASE procedure for a collection of stochastic blockmodels.The key point, here, is that COSIE is a flexible, tractable model for random graphs that retains enough homogeneity-via the common subspace-for ease of estimation, and permits sufficient heterogeneity-via the potentially distinct score matrices R (i) -to model important collections of different graphs.Athreya et al. (2016); Lyzinski et al. (2014); Sussman et al. (2012) show that X(i) is a consistent, asymptotically normal estimate for underlying graph parameters for the random dot product graph model (Young and Scheinerman, 2007), a model in which the connection probability matrix P is given by an outer product of a low-rank matrix X with itself.In Levin et al. (2017) it shown that such a spectral embedding can be profitably deployed for estimation and testing in multiple independent random dot product graphs.We show in Theorem 3 that under mild assumptions on the sparsity of the graphs,

MASE provides a consistent estimate for the COSIE parameters
where O d indicates the space of orthogonal matrices of size d.This result employs bounds introduced by Fan et al. (2017) for subspace estimation in distributed covariance matrices.
Furthermore, under delocalization assumptions on V , we can show that the entries of the MASE estimates R(i) are asymptotically normal, with a bias term that decays as m increases.Namely, there exists a sequence of orthogonal matrices W such that as n → ∞, where H is a random matrix that satisfies ).These results and simulation evidence (see Figure 5 in Section 6) suggest that the multiplicity of graphs in a COSIE can profitably reduce the bias in eigenvalue estimation as compared to a recent work of Tang (2018).
The second key point, then, is that MASE provides consistent estimates for the common subspace V and the score matrices R (i) .

MASE estimates yield state-of-the-art performance on subsequent inference tasks
Section 6 examines a number of graph inference procedures, including community detection, graph classification, and testing.Here, we provide a preview: a glimpse of the impact MASE can have on graph hypothesis testing.Hypothesis testing on populations of graphs is a relatively nascent area in which methodologically sound procedures for graph comparison are scarce.
Our goal is to test the hypothesis that for a given pair of random adjacency matrices A (1) and A (2) , the underlying probability matrices P (1) and P (2) are the same.Using the COSIE model, for any pair of matrices P (1) and P (2) , there exists an embedding dimension d and a matrix with orthonormal columns V ∈ R n×d such that Therefore, our framework reduces the problem to testing the hypothesis H 0 : R (1) = R (2) .
We evaluate the performance of our method and compare it with the omnibus embedding method of Levin et al. (2017), which is one of the few principled methods for this problem in latent space models.
We proceed by generating A (k) ∼ Ber(P (k) ) with a mixed membership SBM with three communities (Airoldi et al., 2007), such that for each i = 1, . . ., n, the row that corresponds to vertex i is distributed as We simulate two scenarios 1. Same community assignments, but different connectivity: fix Z (1) = Z (2) , and vary F by increasing B (2) 11 .
2. Different community assignments, same connectivity: fix B (1) = B (2) , and sample memberships of t vertices independently while keeping the rest to be the same, that is, {Z i• , Z (2) The first scenario can be represented as a COSIE model with dimension d = d 1 = d 2 = 3, where d i is the rank of the matrix P (i) , but for the second one, to correctly represent these graphs in the COSIE model we use For both MASE and OMNI, we obtain the distribution of the test statistic via Monte Carlo simulations using the correct expected value of each graph P * , and for MASE we also evaluate the parametric bootstrap approach q q q q q q q q q q q 0.00 0.25 0.50 0.75 1.00 0.000 0.025 0.050 0.075 0.100 ||B (1) −B (2) || F Empirical power q q q q q q q q q q q q q q q q q q q q q 0.00 in their parameters at a 0.05 significance level (black dotted line).Both graphs are sample from a mixed-membership SBM i) ).In the left figure, the community assignments Z (1) and Z (2) are the same, while the connection probability changes on the x-axis.In the right figure , B (1) = B (2) while the community assignments of a few nodes change.
Both methods improve their power as the difference between the graphs get larger, but MASE can effectively use the common subspace structure to outperform in the empirical power.
proposed by Tang et al. (2017b), as an alternative to use this method in practice when the probability matrix is not known.
Figure 2 shows the result in terms of power for rejecting the null hypothesis that the two graphs are equal.
As the parameters increase their difference, both methods approach full power.The first scenario shows an advantadge for MASE, which exploits the fact that the common subspace is well represented on each graph, and having a smaller number of parameters than OMNI.The second scenario is more challenging for MASE since the number of parameters is increased, but it still shows a competitive performance.In both cases the bootstrap method performed very similar to the true Monte Carlo power.Thus the third key point is that MASE performs competitively with respect to (and in some cases significantly better than) a number of existing procedures for estimation, community detection, graph classification, and testing in simulated stochastic blockmodels.

COSIE and MASE perform well on real data challenges
COSIE and MASE can be deployed effectively in a pressing real-data problem: that of identifying differences in brain connectivity in medical imaging on human subjects.The graphs we examine arise from the HNU1 study (Zuo et al., 2014) which consists of diffusion magnetic resonance imaging (dMRI) records of 30 different healthy subjects that were scanned ten times each over a period of one month.The pre-processing of the data (see Section 7 for details) resulted in 300 graphs with 200 aligned vertices .
After applying the MASE method to the HNU1 data, we obtain a matrix of joint latent positions V ∈ R 200×15 and a set of symmetric score matrices { R(i) } 300 i=1 of size 15 × 15 for each individual graph, yielding 120 parameters to represent each graph.Figure 3 shows the latent positions of the vertices for the first three dimensions of V .
Most of the vertices of the CC200 atlas are labeled according to their spatial location as left or right hemisphere, and this structure is reflected in Figure 3, which suggests that the embedding obtained by MASE is anatomically meaningful.

Now, the matrix of Frobenius distances
F , is invariant to any rotations on V according to Proposition 2. To visualize the geometry of imposed by the distances on the space of subjects, we perform multidimensional scaling (MDS) on the matrix D, with 5 dimensions for the scaling.
Figure 4 shows scatter plots of the positions discovered by MDS for a subset of the graphs corresponding to 5 subjects, chosen because the distances between them represent a useful snapshot of the variability in the data.We observe that the points representing graphs of the same subject usually cluster together.The location of the points corresponding to subjects 12 and 14 suggests some similarity between their connectomes, while subject 13 shows a larger spread, and hence more variability in the brain network representations.
To conclude, the final key point is that COSIE and MASE are robust to the challenges of real data, and provide domain-relevant insights for real data analysis.

The model: Common subspace independent-edge random graphs (COSIE)
To model a sample of m graphs with n aligned vertices we consider an independent-edge random graph framework, in which the edges of a graph A are conditionally independent given a probability matrix P ∈ [0, 1] n×n , so that each entry of P uv denotes the probability of a Bernoulli random variable representing an edge between the corresponding vertices u and v, and hence, the probability of observing a graph is given by We also write P = E[A|P ].This framework encompasses many popular statistical models for graphs (Bickel and Chen, 2009;Hoff et al., 2002;Holland et al., 1983) which differ in the way the structure of the matrix P is defined.
In the single graph setting, imposing some structure in the matrix P is necessary to make the estimation problem feasible, as there is only one graph observation.Under this framework, we can characterize the distribution of a sample of graphs A (1) , . . ., A (k) by modeling their expectations P (1) , . . ., P (m) .As our goal is to model a heterogenous population of graphs with possible different distributions, we do not assume that the expected matrices are all equal, so as in the single graph setting, further assumptions on the structure of these matrices are necessary.
To introduce our model, we start by reviewing some models for a single graph that motivate our approach.One of such models is the random dot product graph (RDPG) defined below.
Definition 1 (Random dot product graph model (Young and Scheinerman, 2007)).Let X = (X 1 , . . ., X n ) T ∈ R n×d be a matrix such that the inner product of any two rows satisfy 0 ≤ X T u X v ≤ 1.We say that a random adjacency matrix A is distributed as a random dot product graph with latent positions X, and write A ∼ RDPG (X), if the conditional distribution of A given X is The RDPG is a type of latent space model (Hoff et al., 2002) in which the rows of X correspond to latent positions of the vertices, and the probability of an edge between a pair of vertices is proportional to the angle between their corresponding latent vectors, which gives an appealing interpretation of the matrix of latent positions X.Under the RDPG model, the matrix of edge probabilities satisfies P = XX T , so the RDPG framework contains the class of independent-edge graph distributions for which P is a positive semidefinite matrix with rank at most d.
More recently, Rubin-Delanchy et al. (2017) introduced the generalized RDPG (GRDPG) model, which extends this framework to the whole class of matrices P with rank at most d, by introducing a diagonal matrix I to the model, such that I is of size d × d and the diagonal entries are either 1 or −1.Given X and I, the edge probabilities are modeled as P = XIX T , which keeps the interpretation of the latent space while extending the class of graphs that can be represented within this formulation.
The stochastic blockmodel (SBM) (Holland et al., 1983) is a particular example of a RDPG in which there are only K < n different rows in X, aiming to model community structure in a graph.In a SBM, vertices are partitioned into K different groups, so each vertex u ∈ V has a corresponding label z u ∈ {1, . . ., K}.The probability of an edge appearing between two vertices only depends on their corresponding labels, and these probabilities are encoded in a matrix B ∈ R K×K such that To write the SBM as a RDPG, let Z ∈ {0, 1} n×K be a binary matrix that denotes the community memberships of the vertices, with Z uk = 1 if z u = k, and 0 otherwise.Then, the edge probability matrix of the SBM is and denote A ∼ SBM (Z, B).Let B = W DW T be the eigendecomposition of B. From this representation, it is easy to see that if we write X = ZW |D| 1/2 , then the distribution of a SBM corresponds to a GRPDG graph with latent positions X.In particular, if B is a positive semidefinite matrix, this is also a RDPG.Multiple extensions to the SBM have been proposed in order to produce a more realistic and flexible model, including degree heterogeneity (Karrer and Newman, 2011), multiple community membership (Airoldi et al., 2007), or hierarchical partitions (Lyzinski et al., 2017).These extensions usually fall within the same framework of Equation ( 1) by placing different constraints on the matrix Z, and hence they can also be studied within the RDPG model.
In defining a model for multiple graphs, we adopt a low-rank assumption on the expected adjacency matrices, as the RDPG model and all the other special cases do.To leverage the information of multiple graphs, a common structure among them is necessary.We thus assume that all the expected adjacency matrices of the independent edge graphs share a common invariant subspace, but allow each individual matrix to be different within that subspace.
Definition 2 (Common Subspace Independent Edge graphs, a.k.aCOSIE).Let be a matrix with orthonormal columns, and We say that the random adjacency matrices A (1) , . . ., A (m) are jointly distributed according to the common subspace independent-edge graph model with bounded rank d and parameters V and R (1) , . . ., R (m) if for each i = 1, . . ., m, given V and R (i) the entries of each A (i) are independent and distributed according to We denote by (A (1) , . . ., m) to the joint model for the adjacency matrices.
Under the COSIE graph model, the expected adjacency matrices of the graphs share a common invariant subspace V that can be interpreted, upon scaling by the score matrices, as the common joint latent positions of the m graphs.This invariant subspace can only be identified up to an orthogonal transformation, so the interpretation of the latent positions is preserved.Note that each graph is marginally distributed as a GRDPG.The score matrices R (i) can be expressed as and D  (i) is a diagonal matrix containing the eigenvalues.Then, the corresponding latent positions of graph i in the GRDPG model are given by The model also introduces individual score matrices R (1) , . . ., R (m) that control the connection probabilities between the edges of each graph.When an R (i) is diagonal, its entries contain the eigenvalues of the corresponding probability matrix P (i) , but in general R (i) does not have to be diagonal.Formal identifiability conditions of the model are discussed in Section 3.1.
The COSIE model can capture any distribution of multiple graphs with aligned vertices, so for any set of probability matrices P (1) , . . ., P (m) , there exist an embedding dimension d ≤ n such that those matrices can be represented with the COSIE model.Indeed, it is trivial to note that if d = n, we can set V = I and R (i) = P (i) .However, for many classes of interest, the embedding dimension d necessary to exactly or approximately represent the graphs is usually much smaller than n.In those cases, the COSIE model can effectively reduce the dimensionality of the problem from O(mn 2 ) different parameters to only O(nd + md 2 ).
To motivate our model, we consider the multilayer stochastic blockmodel for multiple graphs, originally introduced by Holland et al. (1983).In this model, the community labels of the vertices remain fixed across all the graphs in the population, but the connection probability between and within communities can be different on each graph.The parsimony and simplicity of the model, while maintining heterogeneity across the graphs, has allowed its use in different statistical tasks, including community detection (Han et al., 2014), multiple graph inference (Pavlović et al., 2019) and modelling time-varying networks (Matias and Miele, 2017).Formally, the model is defined as follows.
Definition 3 (Multilayer stochastic blockmodel (Holland et al., 1983)).Let Z ∈ {0, 1} n×K be a matrix such that and B (1) , . . ., B (m) ∈ [0, 1] K×K be symmetric matrices.The random adjacency matrices A (1) , . . ., A (m) are jointly distributed as a multilayer SBM, denoted by (A (1) , . . ., The next proposition formalizes our intuition statement about the semiparametric aspect of the model.Namely, the COSIE graph model can represent a multilayer SBM with K communities using a dimension d that is at most K.The proof can be found in the appendix.
Proposition 1. Suppose that Z ∈ {0, 1} n×K and B (1) , . . ., B (m) ∈ [0, 1] K×K are the parameters of the multilayer SBM.Then, for some d ≤ K, there exists a matrix with orthonormal columns V ∈ R n×d and symmetric matrices R (1) , . . ., R (m) ∈ R d×d such that The previous result shows that the multilayer SBM is a special case of the COSIE model.Conversely, if V ∈ R n×d is the invariant subspace of the COSIE model, and V has only K different rows, then the COSIE model is equivalent to the multilayer SBM.Furthermore, several extensions of the single-graph SBM can be translated directly into the multilayer setting, and are also contained within the COSIE model.By allowing the matrix Z to have more than one non-zero value on each row, overlapping memberships can be incorporated (Latouche et al., 2011), and if the rows of Z are nonnegative real numbers such that K k=1 Z uk = 1 then we obtain an extension of the mixed membership model (Airoldi et al., 2007), which can further incorporate degree heterogeneity by multiplying the rows of Z by a constant (Zhang et al., 2014).More broadly, if the rows of V in a COSIE graph are characterized by a hierarchical structure, then the graphs correspond to a multilayer extension of the hierarchical SBM (Lyzinski et al., 2017).Some of these extensions have not been presented before, and hence our work provides a road for studying these models, which are interesting in various applications.
Other latent space approaches for multiple graph data setting have been presented before, but they either tend to limit the heterogeneity in the distributions across the graphs, or include a large number of parameters, which complicate the scalability and interpretation.Levin et al. (2017) presents a method to estimate the latent positions for a set of graphs.Although the method can in principle obtain different latent positions for each graph that do not require any further Procrustes aligment, the method is only studied under a joint RDPG model which assumes that the latent positions of all the graphs are the same.Wang et al. (2017) introduced a semiparametric model for graphs that is able to effectively handle heterogeneous distributions.However, their model usually requires a larger number of parameters to represent the same distributions than COSIE, which complicates the interpretation, and the non-convexity of the problem makes estimation harder.In fact, an equivalent statement to Proposition 1 will require to embed the graphs in a latent space of dimension O(K 2 ), compared to O(K) for COSIE.Other related models that are based on tensor decompositions (Wang et al., 2019;Zhang et al., 2019) present similar challenges on estimation and interpretation.More recently, Nielsen and Witten (2018) studied a multiple RDPG model that is a special case of the model of Wang et al. (2017) by imposing further identifiability constraints.These constraints limit the ability of their model to represent heterogeneous distributions, and, in fact, it is not possible to obtain equivalent statements to Proposition 1 under this model.Bayesian formulations of this problem have also been introduced (Durante and Dunson, 2018;Durante et al., 2017), but computational methods to fit these models limit their applicability to much smaller graphs, with vertices numbering only in the dozens.

Identifiability
In the COSIE model, note that any orthogonal transformation W ∈ R d×d of the parameters keeps the probability matrix of the model unchanged.Indeed, observe that and therefore the most we can hope is that the parameters of the model are identifiable within the equivalence class This non-identifiability is unavoidable in many latent space models, including the RDPG.However, with multiple graphs the situation is more nuanced, as we do not require the probability matrix of each graph to have the same rank.Note that Definition 2 does not restrict the matrices R (1) , . . ., R (m) to be full rank, so the rank of each individual graph can be smaller than the dimension of the joint model.
The following proposition characterizes the identifiability of the model.The proof is given on the appendix.
Proposition 2 (Model identifiability.).Let V ∈ R n×d be a matrix with orthonormal columns and R (1) , . . ., R (m) be symmetric matrices such that these are the parameters of the bounded rank d COSIE model.a) For any for any pair of indices i, j ∈ [m], the pairwise spectral and Frobenius distances m) are identifiable.
The previous results present identifiability characterizations at different levels.At the weakest level, the first part of Proposition 2 ensures that even if the parameters are not identifiable, the Frobenius or spectral distances between the score matrices of the graphs are unique.This property allows the use of distance-based methods for any subsequent inference in multiple graph problems, such as multidimensional scaling, k-means or k-nearest neighbors; some examples are shown in Section 6 and 7.
Proposition 2 also provides an identifiability condition for the invariant subspace V .The matrices R (i) do not need to have the same rank, but the joint model may require a larger dimension to represent all the graphs, which is given by the rank of R. To illustrate this scenario, consider the multilayer SBM with 3 communities.Let B (1) , B (2) ∈ R 3×3 be real matrices, Z ∈ {0, 1} n×3 a community membership matrix, and a > b some constants so that Although the model is represented with three communities, each matrix B (i) is rank 2, so each graph individually can in fact be represented with only two communities.However, since the concatenated matrix whose columns are given by B (1) , B (2) has full rank (i.e rank 3), the three communities of the model can be identified in the joint model, and thus this can be represented as a rank-3 COSIE model.
The last part of Proposition 2 shows that the individual parameters of a graph R (i) are only identifiable with respect to a given basis of the eigenspace V , and hence, all the interpretations that can be derived from depending only on V .Some instances of the COSIE model, including the multilayer SBM, provide specific characterizations of R (i) that can facilitate further interpretation of the parameters.

Fitting COSIE by spectral embedding of multiple adjacency matrices
This section presents a method to fit the model provided in Definition 2 to a sample of m adjacency matrices m) .Fitting the model requires an estimator V for the common subspace V , for which we present a spectral approach.Given an estimated common subspace, we estimate the individual score matrices R (i) for each graph by least squares, which has a simple solution in out setting.In all of our analysis, we assume that the dimension of the model d is known in advance, but we present a method to estimate it in practice.
We start by defining the adjacency spectral embedding (ASE) of an adjacency matrix A, which is a standard tool for estimating the latent positions of a RDPG.
Definition 4 (Adjacency spectral embedding (Rubin-Delanchy et al., 2017;Sussman et al., 2012)).For an adja- d) , and D is a diagonal matrix containing the d largest eigenvalues in magnitude.The scaled adjacency spectral embedding of A is defined as X = V | Ŝ| 1/2 .We refer to V as the unscaled adjacency spectral embedding, or simply as the leading eigenvectors of A.
The ASE provides a consistent and asymptotically normal estimator of the corresponding latent positions under the RDPG and GRDPG models as the number of vertices n increases (Athreya et al., 2017;Rubin-Delanchy et al., 2017;Sussman et al., 2014).Therefore, under the COSIE model for which P = V RV T , the matrix V obtained by ASE is a consistent estimator of V , up to an orthogonal transformation, provided that the corresponding R has full rank.This suggests that given a sample of adjacency matrices, one can simply use the unscaled ASE of any single graph to obtain an estimator of V .However, this method is not leveraging the information about V on all the graphs.
To give an intuition in how to fit the joint model for the data, we first consider working with the expected probability matrix of each graph P (i) .For simplicity in all our analysis here, we assume that all the matrices R (1) , . . ., R (m)  have full rank, so each matrix P (i) has exactly d non-zero eigenvalues.Let V (i) and X (i) be the unscaled and scaled ASE of the matrix P (i) .Then for some orthogonal matrix W (i) ∈ R d×d , and a diagonal matrix D (i) containing the eigenvalues of P (i) .Note that the matrices W (i) are possibly different for each graph, which is problematic when we want to leverage the information of all the graphs.Consider the matrix that is formed by concatenating the unscaled ASEs of each graph, resulting in matrices of size n × (dm).Alternatively, consider the same matrix but concatenating the scaled ASEs, given by Note that both matrices U and U are rank d, and the d left singular vectors of either of them corresponds to the common subspace V , up to some orthogonal transformation.The SVD step effectively aligns the multiple ASEs to a common spectral embedding.In the scaled ASE, the SVD also eliminates the effect of the eigenvalues of each graph, which are nuisance parameters in the common subspace estimation problem.
In practice, we do not have access to the expected value of the matrices P (i) , but we use the procedure described above with the adjacency matrices to obtain an estimator V of the common subspace.Given V , we proceed to estimate the individual parameters of the graphs by minimizing a least squares function, and so, the estimator has a closed form given by We call this procedure the multiple adjacency spectral embedding (MASE), and it is summarized in Algorithm 1.
The MASE method presented for estimating the parameters of the COSIE model only relies on singular value decompositions, and hence it is simple and computationally scalable.This method extends the ideas of a single spectral graph embeeding to a multiple graph setting.The first step of Algorithm 1, which is tipically the major burden of the method, can be carried in parallel.The estimation of the score matrices only requires to know an estimate for V , and hence allows to easily compute an out-of-sample-embedding for a new graph A (m+1)   without having to update all the parameter estimates.

Algorithm 1 Multiple adjacency spectral embedding (MASE)
Input: Sample of graphs A (1) , . . ., A (m) ; embedding dimensions d and {d i } m i=1 .1.For each i ∈ [m], obtain the adjacency spectral embedding of A (i) on d i dimensions, and denote it by 3. Define V ∈ R n×d as the matrix containing the d leading left singular values of Û .

For each
Figure 1 shows a graphical representation of MASE for estimating the common subspace of a set of four graphs.
The graphs correspond to the multilayer SBM with two communities, and the connection matrices B (i) are selected in a way that the graphs have an heterogeneous structure within the sample.Note that after step 1 of Algorithm 1 the latent positions obtained by ASE are slightly rotated relatively to each graph.After estimating the common subspace by SVD, a common set of latent positions is found, which look tightly clustered within their community, showing that MASE is able to leverage the information of all the graphs in this example.
The first step of Algorithm 1 corresponds to a separate ASE of each graph.We stated the algorithm using the unscaled version of the ASE, and later in Section 5, we show that this version of the method is consistent in estimating the common subspace V .In practice, this step can be replaced with other spectral embeddings of A (i) , including the scaled version described above.Note that the scaled ASE also makes use of the eigenvalues, and thus it puts more weight onto the columns of the matrix Û that correspond to the eigenvectors associated with the largest eigenvalues, which can be convenient, especially if the dimension d is overestimated.Other spectral embeddings that also aim to estimate the eigenspace of the adjacency matrices, including the Laplacian spectral embedding or a regularized version of it (Le et al., 2017;Priebe et al., 2019), might be preferred in some circumstances, but we do not explore this direction further.
The method for estimating the common invariant subspace is related to other similar methods in the literature.Crainiceanu et al. (2011) proposed a population singular value decomposition method for representing a sample of rectangular matrices with the same dimensions, and use it to study arrays of images.In a different work, Fan et al. (2017) introduced a method for estimating the principal components of distributed data.Their method computes the leading eigenvectors of the covariance matrix on each server, and then obtains the leading eigenvectors of the average of the subspace projections, which is shown to converge to the solution of PCA with the same rate as if the data were not distributed.Both methods correspond to the unscaled ASE version of MASE.In our case, we show that this particular way of estimating the parameters of the COSIE model results in a consistent estimator of the common invariant subspace, and asymptotically normal estimators of the individual parameters.

Choice of embedding dimension
In practice, the dimension of the common subspace d is usually unknown.Moreover, each individual R (i) might not be full rank, and so estimating a different embedding dimension d i for each graph might be necessary.
The actual values of d and {d i } correspond to the ranks of U and {P (i) } respectively, and so they can be approximated by estimating the ranks of { Â(i) } and Û .The scree plot method, which consists in looking for an elbow in the plot of ordered singular values of a matrices, provides an estimator of these quantities, and can be automatically performed using the method proposed by Zhu and Ghodsi (2006).We use this method to fit the model to real data in Section 7.

Theoretical results
In this section, we study the statistical performance of Algorithm 1 in estimating the parameters of the COSIE model when a sample of graphs is given.We first study the expected error in estimating the common subspace, and show that this error decreases as a function of the number of graphs.This result demonstrates that our method is able to leverage the information of the multiple graphs to improve the estimation of V .
To study the estimation error of our method, we consider a sequence of parameters of the COSIE model The magnitude of the entries of the parameters typically changes with n, and in order to obtain consistent estimators we will require that R (i) n → ∞.This is a natural assumption considering the fact that R (i) n contains the eigenvalues of the expected adjacency matrix of a graph, and requiring these eigenvalues to grow as n increases is necessary in order to control the error of ASE (Athreya et al., 2017).In the following, we omit the dependence of the parameters in n.To simplify the analysis, we assume that all the score matrices R (1) , . . ., R (m) have full rank.
Our next theorem introduces a bound on the expected error in estimating the common subspace of the COSIE model, which is the basis of our theoretical results.The proof of this result relies on bounds from Fan et al. (2017) for studying the eigenvector estimation error in distributed PCA.However, our setting presents some substantial differences, including the distribution of the data and the fact that the expected adjacency matrices are not jointly diagonalizable.Moreover, the scaled version of the algorithm adds the complication of working with eigenvalues, and extending the theory to this case is more challenging.We thus work with the unscaled version of the ASE in Algorithm 1.Before stating the theorem, we introduce some notation.For a given square symmetric matrix M ∈ R r×r , denote by λ 1 (M) ≥ . . .≥ λ r (M) to the ordered eigenvalues of M, and define λ min (M) and λ max (M) as the smallest and largest eigenvalue in magnitude of M, and δ(M) = max u∈[r] r v=1 M ur the largest sum of the rows of M.
Theorem 3. Let R (1) , . . ., R (m) be a collection of full rank symmetric matrices of size d × d, V ∈ R n×d a matrix with orthonormal columns, and (A (1) , . . ., A (m) ) ∼ COSIE V , R (1) , . . ., R (m) a sample of m random adjacency matrices, and set . ( Let V be the estimator of V obtained by Algorithm 1. Suppose that min i∈[m] δ(P (i) ) = ω(log n) and ε = o(1). Then, (3) The above theorem requires two conditions on the expected adjacency matrices, that control the estimation error of each ASE for a single graph.First, the largest expected vertex degree δ(P (i) ) needs to grow at rate ω(log n) to ensure the concentration in spectral norm of the adjacency matrix to its expectation (see Theorem 20 of Athreya et al. (2017)).Second, the condition ε = o(1) is required to control the average estimation error of each ASE.
For this condition to hold, it is enough to have that δ( Specific conditions for the multilayer SBM that relate the community sizes and the eigenvalues of the connection probability matrices are presented in Section 5.1. Theorem 3 theorem bounds the expected error in estimating the subspace of V by the sum of two terms in Equation (3).To understand these terms, note that the error can be partitioned as min where Ṽ is the matrix containing the d leading eigenvectors of E V V T . The first term of the right hand side of Equation ( 4) measures the variance of the estimated projection matrix, which corresponds to the first term in Equation (3).Thus, this term converges to zero as the number of graphs increases.The estimated projection is not unbiased, and therefore the second term of Equation 4 is always positive, but its order is usually smaller than the variance.This bound is a function of ε, which quantifies the average error in estimating V by using the leading eigenvectors of A (i) on each network.Note that the inclusion of an orthogonal transformation W has to do with the fact that the common subspace is only identifiable up to such transformation.
The above result establishes that when the sample size m is small (m 1/ε), the variance term dominates the error as observed in the left hand side of Equation ( 3), but as the sample size increases this term vanishes.
The second term is the bias of estimating V by using the leading eigenvectors of each network, and this term dominates when the sample size is large.These type of results are common in settings when a global estimator is obtained from the average of local estimators (Arroyo and Hou, 2016;Fan et al., 2017;Lee et al., 2017).
To illustrate the result of Theorem 3, consider the following example on the Erdös-Rényi (ER) model (Erd ős and Rényi, 1959;Gilbert, 1959).In the ER random graph model G(n, p), the edges between any pair of the n vertices are formed independently with a constant probability p.We consider a sample of adjacency matrices A (1) , . . ., A (m) , each of them having a different edge probability p 1 , . . ., p m so that , where 1 n is the n-dimensional vector with all entries equal to one, we can represent these graphs under the COSIE model by setting In the dense regime when p i = ω(1), the error of estimating the common subspace is of order n .An alternative estimator for the common subspace V is given by the ASE of the mean adjacency matrix, ASE 1 m m i=1 A (i) , for which the expected subspace estimation error is O 1 √ mn (Tang et al., 2018).When m n, the error rate of both estimators coincide.However, the estimator of Tang et al. (2018) is only valid when all the graphs have the same expected adjacency matrices, and can perform poorly in heterogeneous populations of graphs (see Section 6).The ER model described above can be regarded as a special case of the multilayer SBM with only one community, which we consider next.

Perfect recovery in community detection
Estimation of the common subspace in the multilayer SBM is of particular interest due to its relation with the community detection problem (Abbe, 2017;Girvan and Newman, 2002).In the multilayer SBM with K communities, the matrix of the common subspace basis V has K different rows, each of them corresponding to a different community.An accurate estimator V will then reflect this structure, and when the community labels of the vertices are not known, clustering the rows of V into K groups can reveal these labels.Different clustering procedures can be used for this goal, and in this section we focus on K-means clustering defined next.Suppose that Z is the set of valid membership matrices for n vertices and K communities, that is, The community assignments are obtained by solving the K-means problem (5) The matrix Ẑ thus contains the estimated community assignments, and Ĉ their corresponding centroids.
To understand the performance in community detection of the procedure described above, consider a multilayer SBM with parameters B (1) , . . ., B (m) ∈ [0, 1] K×K and Z ∈ {0, 1} n×K .Denote the community sizes by n 1 , . . ., n K , such that k j=1 n j = n, and define the quantities Ξ = diag(n 1 , . . ., n K ), n min = min j∈[K] n j , n max = max j∈[K] n j and P (i) = ZB (i) Z T .To obtain a consistent estimator of the common invariant subspace, the following assumption is required.
Assumption 1.There exist some absolute constants κ ∈ (0, 1], γ > 0 such that for all i ∈ [m], The previous assumption requires a uniform control on ratios of the community sizes and the eigenvalues of the connectivity matrices.When all communities have comparable sizes (i.e., n min ≥ cn max for some absolute constant c > 0), then the smallest eigenvalue of each B (i) is allowed to approach zero at a rate On the other hand, if the smallest eigenvalue of each B (i) is bounded away from zero by an absolute constant, then the size of the largest community can be at most The next Corollary is a consequence of Theorem 3 for the multilayer SBM setting.The proof is given on the Appendix.
Corollary 4. Consider a sample of m adjacency matrices from the multilayer SBM (A (1) , . . ., A (m) ) ∼ SBM Z; B (1) , . . ., B (m) , and let V the estimated subspace from Algorithm 1. Suppose that n min = ω(1), δ(P (i) ) = ω(log n), and that Assumption 1 holds.Then, for sufficiently large n, The above corollary presents sufficient conditions for consistent estimation of the common invariant subspace, which contains the information of the community labels encoded in the matrix Z.The corollary requires the size of the smallest community to grow as n increases, which is something commonly assumed in the literature (Lyzinski et al., 2014;Rohe et al., 2011).
The following result provides a bound on the expected number of misclustered vertices after clustering the rows of V according to Equation (5).Due to the non-identifiability of the community labels, these are recovered up to a permutation Q ∈ P K , where P K is the set of K × K permutation matrices.
Theorem 5. Suppose that Ẑ is the membership matrix obtained by Equation (5).Under the conditions of Corollary 4, there exist some absolute constants γ > 0, κ ∈ (0, 1] such that the expected number of misclustered vertices satisfies E min Similar to the estimation error of the the common subspace in Theorem 3, the bound on the expected clustering error depends on two terms in Equation ( 7).When the number of graphs m increases, the first term vanishes.
The second term also decays to zero as long as Kn max = o(n 2κ min ), and therefore the algorithm recovers the community labels correctly as n goes to infinity, provided that m = Ω(n κ min ).On the other hand, our theory does not guarantee perfect community recovery for m = O(1) because the first term in Equation ( 7) has at least a constant order.We remark that this fact is a technical consequence of our analysis of subspace estimation error, and it is analogous to similar results obtained in spectral clustering on a single graph (Rohe et al., 2011).In contrast, previous work by Lyzinski et al. (2014) shows, via a careful analysis of the row-wise error in subspace estimation, that spectral methods can produce perfect recovery in community detection in the single-graph setting.Developing a similar result in our setting requires further investigation that we leave as future work.
MASE provides a simple extension of spectral clustering to the multiple-graph setting when all graphs share the same community structure.This community detection method is scalable and can effectively handle the heterogeneity of the graphs.Other spectral approaches for this problem rely in averaging the adjacency matrices (Bhattacharyya and Chatterjee, 2018;Han et al., 2014) or a transformation of them (Bhattacharyya and Chatterjee, 2017), which requires stronger assumptions on the connection probabilities matrices or can increase the computational cost.Other approaches for this problem, including likelihood and modularity maximization (Han et al., 2014;Mucha et al., 2010;Peixoto, 2015) require to solve large combinatorial problems, which usually limits their scalability.

Asymptotic normality of the score matrices
Now, we study the asymptotic distribution of the individual estimates of the score matrices.We show not only that the entries of R(i) accurately approximate the entries of R (i) up to some orthogonal transformation, but that under delocalization assumptions on V , the entries In effect, we prove a central limit with a bias term, but this bias term decays as the number of graphs m increases.Our central limit theorem depends, in essence, on the independence of the errors between the adjacency matrices A (i) and the probability matrices P (i) , and our ability to estimate the common subspace V to sufficient precision.As such, it exemplifies the noteworthy balance the COSIE model achieves between the homogeneity of the common subspace across graphs, which allows us to leverage probabilistic regularity for estimation accuracy, and heterogeneity in the score matrices.The former enables us to infer the latter.
To make such our central limit theorem precise, we require the following delocalization and edge variance assumption.
Assumption 2 (Delocalization of V ).There exist constants c 1 , c 2 > 0, and an orthogonal matrix W ∈ O d such that each entry of V W satisfies The delocalization assumption requires that the score matrix V (i) influence the connectivity of enough edges in the graph.Because the common subspace V is invariant to orthogonal transformations, the assumption only requires the existence of a particular orthogonal matrix W that satisfies the entrywise inequalities.This delocalization assumption is satisfied by a wide variety of graphs; for example, if we consider V as the eigenvectors of an Erd ős-Rényi graph, its entries are all of order 1/ √ n, and similarly if V are the eigenvectors of the probability matrix of a stochastic blockmodel all of whose block sizes grow linearly with n (see Proposition 6 below).

Assumption 3 (Edge variance).
There exists a constant c 3 > 0 such that Assumptions 2 and 3 hold for a wide variety of graphs.In particular, when the community sizes of an SBM are balanced and the connection probabilities do not decay very fast, the next proposition shows that these assumptions hold.The proof is included on the Appendix.Proposition 6.Let Z ∈ {0, 1} n×K be a membership matrix with community sizes n 1 , . . ., n K , and B (i) ∈ [0, 1] K×K a connectivity matrix .
where N (0, 1) is a univariate standard normal distribution, H m is a random matrix that satisfies E[ , with s 2 (P (i) ) as defined in Equation (8).
In addition to Assumptions 2 and 2, to obtain an asymptotic normal distribution of the entries of R (i) Theorem 7 requires slightly stronger assumptions than Theorem 3. In particular, the average ASE error ε needs to decrease at a certain rate.
To illustrate the above, we consider the problem of estimating the eigenvalues of a random graph under the COSIE model.For that goal, we generate a sample of m adjacency matrices from the COSIE model, and use the eigenvalues of R( 1) as an estimate of the eigenvalues of P (1) .According to Theorem 7, the entries of R( 1) are asymptotically normally distributed and centered around a bias term H that vanishes as the sample size grows.We conjecture that there is a similar phenomenon in the eigenvalues of R (1) ; namely, that the eigenvalues of R(i) are asymptotically normally distributed about the eigenvalues of P , albeit with some bias.More formally, we conjecture that where η m = O P (1/ √ m).These results dovetail nicely the theory of Tang (2018), which presents a normality result, with bias, for the eigenvalues of stochastic blockmodel graphs.To illustrate the bias-reduction impact of multiple graphs in our case, consider the problem of eigenvalue estimation in both the single and multiple-graph setting.Figure 5 shows that as the sample size increases, the distribution of the two leading eigenvalues of a SBM graph estimated with MASE approaches the true eigenvalues.When the number of vertices increases but the sample size remains constant, this is not the case, as observed in Figure 6.

Simulations
In this section, we study the empirical performance of MASE for estimating the parameters of the COSIE model, as well the performance of this method in subsequent inference tasks, including community detection, graph classification and hypothesis testing.In all cases, we compare with state-of-the-art models and methods for multiple-graph data as baselines.The results show that the COSIE model is able to effectively handle heterogeneous distributions of graphs and leverage the information across all of them, and demonstrates a good empirical performance in different settings.

Subspace estimation error
We first study the performance in estimating the common subspace V by using the estimator V obtained by MASE in a setting where the number of graphs m increases.Given a pair of matrices with orthonormal columns V and V , we measure the distance between their invariant subspaces via the spectral norm of the difference between the projections, given by V V T − V V T .This distance is zero only when there exist an orthogonal matrix W ∈ O d such that V = V W . λ ^− λ Density q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q First eigenvalue Second eigenvalue  λ ^− λ Density q q q q q q q q q q q q q q q q q q q q First eigenvalue Second eigenvalue Given a sample of graphs A (1) , . . ., A (m) , the ASE of the average of the graphs, given by provides a good estimator of the invariant subspace when all the graphs have the same expectation matrix (Tang et al., 2018).However, when each graph has a different expected value, Ā approaches to the population average, which might not contain useful information about the structure of the invariant subspace.MASE, on the other hand, is able to handle heterogeneous structure, and does not require such strict assumptions.We compare this method with the performance of MASE these two settings, i.e., equal and different distribution of the graphs.
We simulate graphs from a three-block multilayer stochastic blockmodel with membership matrix Z ∈ {0, 1} n×3 and connectivity matrices B (1) , . . ., B (m) .The number of vertices is fixed to 3 6 = 729, and equal-sized communities.We simulate two scenarios.In the first scenario, all graphs have the same connectivity, given by In the second scenario, each matrix is generated randomly by independently sampling the entries as We then compare the performance of the different methods as the number of graphs increases.We consider both the unscaled and scaled versions of MASE, and ASE on the mean of the adjacency matrices (ASE(mean)).Figure 7 shows the average subspace estimation error of several Monte Carlo experiments (25 simulations in the first scenario, and 100 in the second).In the first scenario, the three methods exhibit a similar performance, decreasing the error as a function of the sample size, and although ASE(mean) has a slightly better performance, the performance of MASE is statistically not worse than ASE(mean).In the second scenario, when all the graphs have different expected values, ASE(mean) performs poorly, even with a large number of graphs, while MASE mantains a small error and is still able to improve the performance with larger sample size.These results corroborate our theory that MASE is effective in leveraging the information across multiple heterogeneous graphs, and it performs similarly to the ASE on the average of the graphs when the sample has a homogeneous distribution.
The results of Figure 7 also show that in some circumstances the scaled MASE can perform significantly better than its unscaled counterpart in estimating the common invariant subspace.When some eigenvalues of a matrix B (i) are close to zero, the estimation of the eigenvectors corresponding to the smallest non-zero eigenvalues is difficult.The eigenvalue scaling in ASE can help MASE in discerning the dimensions of that are more accurately estimated, which might explain the improved performance of the scaled MASE in this simulation.Our current theoretical results only study the unscaled MASE, but the simulations encourage extensions of the analysis to the scaled MASE.

Modeling heterogeneous graphs
Now, we study the performance of MASE in modeling the structure of graphs with different distributions.First, simulated graphs are distributed according to the multilayer SBM with two blocks, but now each graph i has an associated class label y i ∈ {1, 2, 3, 4}.The connection matrices B (i) have four different classes depending on the label, so if where α ≥ 0 is a constant that controls the separation between the classes, and the matrices C (k) are defined as q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q 0.10 Number of graphs Subspace distance q q q q q q q q q q q q q q q q q q q q 0.1 and ASE of the mean of the graphs for a sample of graphs distributed according to a multilayer SBM with 3 communities.On the left panel, all graphs have the same expected matrix.The three methods perform almost the same, reducing the error as the sample size increases.On the right panel, the connection probabilities of the SBM are chosen uniformly at random for each graph.ASE on the average adjacency matrix performs poorly, while MASE still improves with a larger sample size.
This choice of the graphs allows to simultaneously study the effects on the difference between the classes and the smallest eigenvalue of the score matrices.When α = 0, all score matrices are the same, but also the smallest eigenvalue is zero, so both subspace estimation and graph classification are hard problems, and as α increases, the signal for these problems improves.
We simulated 40 graphs, with 10 on each class, all with n = 256 vertices and equal-sized communities.We compare the performance of MASE with respect to other latent space approaches for modeling multiple graphs: joint embedding of graphs (JE) (Wang et al., 2017), multiple RDPG (MRDPG) (Nielsen and Witten, 2018), and the omnibus embedding (OMNI) (Levin et al., 2017).The first two methods, JE and MRDPG, are based on a model related to COSIE.In particular, the expected value of each adjacency matrix is described as where H is a n × d matrix, and Λ is a d × d diagonal matrix.In JE, the matrix H is only restricted to have columns with norm 1, which makes the model non-identifiable, while MRDPG imposes further restrictions on H to be orthogonal and Λ to have nonnegative entries.Both models are fitted by optimizing a least squares loss function to obtain estimates Ĥ and { Λ(i) } m i=1 .The omnibus embedding obtains individual latent positions for each graph by jointly embedding the graphs into the same space.Given a sample of m adjacency matrices, the omnibus embedding obtains an omnibus matrix M ∈ R mn×mn , in which each off-diagonal submatrix is the average of each pair of graphs in the sample, so that The omnibus embedding consists in obtaining the ASE of M, and each block of n rows of the embedding corresponds to the estimated latent positions of each graph, such that For MASE and OMNI, we use an embedding dimension d = 2; JRDPG and MRDPG represent the scores of each graph with a diagonal matrix, so in order to capture all the variability in the data we use d = 3 for the embedding dimension.In all comparisons, the two versions of MASE (scaled and unscaled ASE) performed very similarly, and therefore we only report the results of unscaled MASE.q q q q q q q 0.2

Semi-supervised graph classification
First, we measure the accuracy of these methods in classifying the graphs.We use the simulated graphs as a training data, and generate a second set of graphs with the same characteristics of the first to use as a test data.
In order to get the classification accuracy, we first used each method to obtain an embedding of all the graphs, including training and test data.We then use the Frobenius distance between the individual scores of each graph to build a 1-nearest neighbor classifier, and use the labels of the training data to classify the test.The individual in JE and MRDPG, and { X(i,OMNI) } m i=1 in OMNI.Note that in constructing these scores, the test data was used.This semi-supervised approach is necessary for comparing the methods because OMNI and MRDPG do not have an established out-of-sample extension for the embedding.
Figure 8 shows the average classification accuracy as a function of α for 50 Monte Carlo simulations.For all methods, the accuracy increases as the separation between each class model increases with α, and MASE performs best among all the methods.This result is not surprising, considering the fact that MASE has better flexibility to model heterogeneous graphs.The performance of OMNI is also excellent, but keep in mind that this method employs a much larger number of parameters to represent each graph.JE is based on a model that can also represent each expected adjacency correctly, but the non-convexity and over-parametertization of the objective function can make the optimization problem challenging, and the performance depends on the initial value, which is random, resulting in inferior performance on average.The identifiability constraints of MRDPG limit the type of graphs that this model can represent, and this method is never able to separate two of the classes correctly, resulting in a limited performance in practice.

Model estimation
We also compare the performance of different methods in terms of approximating the probability matrices of the model.For each method and each graph, we obtain an estimate of the generative probability matrix of the model P (i) , and measure the model estimation error with the normalized mean squared error as F , q q q q q q q 0.04 the only methods that are flexible enough to capture the heterogeneous structure of the graphs, but MASE shows superior performance among these two.As the graphs become more different, the error of MRDPG and OMNI increases.and report the average estimation error over all the graphs.Figure 9 reports the average Monte Carlo estimation error as a function of α.Again, we observe that MASE has the best performance among all the methods.This is not surprising, considering that MASE is designed for this model, but it shows the limitations of the other methods.JE also shows an improvement in estimation as the separation between the classes increases, since this is the only other method that can represent correctly the four classes, but as in the classification task, the variance is larger.While OMNI can discriminate the graphs correctly, as previously observed, the error in estimating the generative probability matrix is large.MRDPG is again not able to succeed due to the model limitations.

Community detection
We use the joint latent positions obtained by each method to perform community detection, by clustering the rows of each embedding using a gaussian mixture model (GMM).For MASE, we use the matrix V as the estimated joint latent positions, while for JE and MRDPG we use the matrix Ĥ.For OMNI, we take the average of the individual latent positions of the graphs m i=1

X(m,OMNI)
. In all cases, because the number of communities is known, we fit two clusters using GMM, and compare with the true communities of the multilayer SBM.
The average accuracy in clustering the communities is reported in Figure 10.Spectral clustering methods require to have enough separation on the smallest non-zero eigenvalue.This is the case for MASE and OMNI, which show poor performance with small α, but once the signal is strong enough, both methods perform excellent.MRDPG shows a good performance, even for small α, which could be due to the optimization procedure employed by the method in fitting V .As in all the other tasks, JE shows a high variance in performance.

Modeling the connectivity of brain networks
In this section, we evaluate the ability of our method to characterize differences in brain connectivity between subjects using a set of graphs constructed from diffusion magnetic resonance imaging (dMRI).The data corresponds to the HNU1 study (Zuo et al., 2014) which consists of dMRI records of 30 different healthy subjects that were scanned ten times each over a period of one month.Based on these recordings, we constructed 300 graphs (one per subject and scan) using the NeuroData's MRI to Graphs (NDMG) pipeline (Kiar et al., 2018).The vertices were registered to the CC200 atlas (Craddock et al., 2012), which identified 200 vertices in the brain.q q q q q q q 0.5 and four classes of connectivity matrices.The class separation controls the magnitude of the smallest eigenvalue of the graphs, and as this increases, all methods show better accuracy.MRDPG, which is based on a non-convex optimization problem, performance the best for small α, but as long as α is large enough, MASE and OMNI also show a great performance.
We first apply our multiple adjacency spectral embedding method to the HNU1 data.To choose the dimension of the embedding for each individual graph, we use the automatic scree plot selection method of Zhu and Ghodsi (2006), which chooses between 5 and 18 dimensions for each graph (see Figure 11).The joint model dimension for MASE is chosen based on the scree plot of the singular values of the matrix of concatenated adjacency spectral embeddings.We perform the method using both the scaled and unscaled versions of ASE.In both cases, we observe an elbow on the scree plot at d = 15 (see Figure 12), and thus we selected this value for our model.
As we mentioned in Section 2, when we apply the MASE method to the HNU1 data, we obtain a matrix of joint latent positions V ∈ R 200×15 and a set of symmetric matrices { R(i) } 300 i=1 of size 15 × 15 for each individual graph, yielding 120 parameters to represent each graph.Again, as we saw in Figure 3 in Section 2, a threedimensionional plot of the latent positions of the vertices for V is revealing.Most of the vertices of the CC200 atlas are labeled according to their spatial location as left or right hemisphere, and this structure is reflected in Figure 3-in fact, the plane v 3 = 0 appears to be a useful discriminant boundary between left and right hemisphere.Embedding dimension

Counts
Figure 11: Histogram of the embedding dimension selected by the method of Zhu and Ghodsi (2006) for each of the 300 graphs in the HNU1 data.The graphs are composed by 200 vertices, and the values of the embedding dimensions ranges between 5 and 18. q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q 0 25 50 75 0 25 50 75 100 Index Singular values q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q 5 10 15 This is additional fortifying evidence that the embedding obtained by MASE is anatomically meaningful.
The individual graph parameters { R(i) } 300 i=1 of dimension 15 × 15 are difficult to interpret because their values are identifiable only with respect to V , but also because of the large dimensionality.Nevertheless, the matrix of Frobenius distances F , is invariant to any rotations on V according to Proposition 2. This matrix is shown in Figure 13, and reflects that graphs coming from the same individual tend to be more similar to each other than graphs of different individuals.Again, as we saw in Figure 4, we can visualize the geometry of these distances over the space of subjects.We perform multidimensional scaling (MDS) on the matrix D with 5 dimensions for the scaling.Figure 4 shows scatter plots of the positions discovered by MDS for a subset of the graphs corresponding to 5 subjects.We observe that the points representing graphs of the same subject usually cluster together.
The ability of our method to distinguish between graphs of different subjects is further evaluated via classification and hypothesis testing.In the classification analysis, we performed a comparison to other methods that are based on low-rank embeddings following a similar procedure to Section 6.2.In a real data setting, low-rank models are only an approximation to the true generation mechanism, and a more parsimonious model that requires fewer parameters to approximate the data accurately is preferable.Hence, for model selection, we measure the accuracy as a function of the number of embedding dimensions, which controls the description length, defined as the total number of parameters used by the model.Figure 14 shows the 10-fold cross-validated error of a 1nearest neighbor classifier of the individual graph parameters obtained by the methods.Note that both MASE and OMNI are able to obtain an almost perfect classification, with only one graph being misclassified, in both cases corresponding to subject 20.This graph can be observed to be different to all the other graphs in Figure 13.
Although OMNI can achieve good accuracy with only one embedding dimensions, the description length is much larger because OMNI requires dn parameters for each graph, while MASE only uses d × d symmetric matrices.MASE achieves an optimal classification error with only d = 9 dimensions; this classification analysis also offers a supervised way to choose the number of embedding dimensions for MASE suggesting that 9 might be enough, but we keep the more conservative choice of d = 15.Neither JE or MRDPG achieve an optimal classification error even with d = 20 dimensions, and they are outperformed by MASE even when fixing the description length in all methods.This result suggests that MASE is not only accurate, but more parsimonious than other similar low-rank embeddings.
The matrix of distance between subject parameters obtained by MASE can be used in combination to other distance-based methods to find differences and similarities between subjects, as the previous analyses showed.However, a more principled way to identify whether two graphs are similar is by measuring how much of the differences between graphs can be attributed to random noise.We use the COSIE model to test the hypothesis that the expected edge connectivity of each pair of adjacency matrices is the same.That is, for each pair of graphs A (i) and A (j) we test the hypothesis H 0 : R (i) = R (j) .We followed the same procedure described in Section 2.3 and used the parametric bootstrap method of Tang et al. (2017b) to obtain a null distribution of the test statistic.
Figure 15 shows the p-values of the test performed for every pair of graphs.The diagonal blocks of the matrix q q q q q q q q q q q q q q q q q q q q 0.000 Classification error q q q q q q q q qqqqqqq q q q q q 0.000 correspond to graphs representing the same subject, and in most cases these p-values are large, suggesting that we cannot reject the hypothesis that the graphs have the same distribution, which is reasonable given that the graphs represent the same person's brain.Off-diagonal elements of the matrix in Figure 15 represent the result of the equality test for a pair of graphs corresponding to different subjects, and exhibit small p-values in general.
For some pairs of subjects, such as 12 and 14, the p-values are also high, suggesting some possible simmilarities in the brain connectivity of these two subjects, which was also observed on Figure 4.The left panel in Figure 16 shows the percentage of rejections of the test for the p-values in the diagonal of the matrix, and the black dotted line represents the expected number of rejections under the null hypothesis.Note that both OMNI and MASE have slightly more rejections than what would be expected by chance under their corresponding models.The right panel in Figure 16 shows the percentage of rejected tests for pairs of graphs corresponding to different subjects.Both MASE and OMNI reject a large portion of those tests.

Discussion
The COSIE model presents a flexible, adaptable model for multiple graphs, one that encompasses a rich collection of independent-edge random graphs and yet retains identifiability and model parsimony.As described in the text, several single-graph models can be easily extended to the multiple-graph setting (Airoldi et al., 2007;Lyzinski et al., 2017;Zhang et al., 2014) within the COSIE framework.The presence of a common subspace and the allowance for different score matrices guarantee that COSIE can be used to approximate important real-world graph heterogeneity.The MASE procedure builds on the long history of spectral graph inference; it is intuitive, scalable, and consistent for the recovery of the common subspace in COSIE.The accurate inference of the com- mon subspace, in turn, enables us to further determine the score parameters, which can then be deployed to discern distinctions between graphs and even subgraphs.Moreover, the MASE procedure enables us to estimate graph eigenvalues, detect communities, and test whether graphs arise from a common distribution.These are a critical corpus of graph inference tasks.
Much work remains: allowing the score matrices to grow in rank or capture important population-level properties such as variation in time or multilayer structure; to consider estimation and limit results with more relaxed sparsity and delocalization assumptions; and to investigate other spectral embeddings, such as the normalized or regularized Laplacian, which has been to shown to uncover different network structure Priebe et al. (2019) and provide consistent estimators in sparse networks (Le et al., 2017).Indeed, there is a rich array of open problems that can be addressed within the broad foundation of the COSIE model, and it presents a valuable and practical addition to multiple graph inference.Open R source code for MASE is available at https://github.com/jesusdaniel/mase, and in the Python GraSPy package at https://neurodata.io/graspy (Chung et al., 2019).
c) Suppose that U = V .Then, multiplying Equation ( 9) by V T and V on the left and right establishes that S (i) = R (i) .

Estimation of the common invariant subspace
Before presenting the proof of Theorem 3, we first establish the following lemma.
Lemma 8. Let (A n ) ∞ n=1 be a sequence of the adjacency matrices of independent-edge Bernoulli random graphs such that A n ∈ {0, 1} n×n and E[A n |P n ] = P n for some P n ∈ R n×n .Suppose that δ(P n ) = ω(log n).Then, there exists some n 0 ∈ N such that for all n ≥ n 0 , the random variable A n − P n is sub-gaussian and Proof.By Theorem 20 in Athreya et al. (2017), for any c > 0 there exists C > 0 such that if δ(P n ) > C log n then for any n −c < η < 1/2 we have Taking c = 1.1, η = 1/n and n 0 such that δ(P n ) > C log n holds for the appropriate C and all n ≥ n 0 , we have that Therefore, by Lemma 5.5 in Vershynin (2010), A n − P n is sub-Gaussian for all n ≥ n 0 , and Proof of Theorem 3. We introduce some notation.Define the following matrices as Note that Û Û T = Π, and therefore V corresponds to the d leading eigenvectors of Π.
To prove Theorem 3, we follow the main arguments of Fan et al. (2017).We analyze the terms of the right hand side of Equation ( 4) separately.For the first term, corresponding to V V T − Ṽ Ṽ T F , note that by the Davis-Kahan theorem (see Yu et al. (2015)), Using again the the Davis-Kahan Theorem, the spectral norm of each term inside the norm of the numerator can be bounded as Now, to control the error of each term in the numerator of Equation ( 10), we use the triangle and Jensen inequalities, and the previous bound in Equation ( 11) as follows where the last inequality comes from Lemma 8, assuming that n is sufficiently large.Using Lemma 4 of Fan et al. ( . Combining the previous equation and Equation ( 10), δ(P (i) ) λ 2 min (R (i) ) . (12) To control the error of the second term in Equation (4), we use the triangle inequality and Lemma 2 of Fan et al. (2017).For each i ∈ [m], suppose that U correspond to the d leading eigenvectors of P (i) when λ j (P (i) ) = 0, and they span the same invariant subspace as V .Define U (i) ∈ R n×d as Note that U (i) U (i) T = V V T .Also, define Λ (i) ∈ R d×d be a diagonal matrix containing the d leading eigenvalues in magnitude of P (i) , such that Λ i) as Note that E[H (i) ] = 0, and By Lemma 2 of Fan et al. (2017), if A (i) − P (i) ≤ λ min (R (i) )/10 then On the other hand, if A (i) − P (i) > λ min (R (i) )/10, by using the Davis-Kahan theorem and Equation (13), Define D as the event in which A (i) − P (i) > λ min (R (i) )/10.Observe that By the triangle inequality, Combining Equations ( 12) and ( 14), we obtain that Finally, the denominator of the first term can be bounded using Weyl's inequality, by observing that Finally, Equation ( 14) and the fact that ε = o(1) ensures that this denominator is bounded away from zero, which completes the proof.

Community detection results
Proof of Corollary 4. To obtain a bound on ε defined according to Theorem 3, observe that δ(P (i) ) = max Corollary 4 immediately implies the result.

Asymptotic normality results
Proof of Proposition 6. a) Define Ξ = diag(n 1 , . . ., n K ) as before.Note that ZΞ −1/2 is the common subspace of a multilayer SBM with community membership Z.Therefore, it is enough to show that there exist some W ∈ O K such that Assumption 2 holds for ZΞ −1/2 W . Define an orthogonal matrix W such that for all k, l ∈ [K].Such a matrix exists for any K ≥ 1 (see for example Jaming and Matolcsi (2015)).We now turn to our proof of the asymptotic distribution of R(i) .In this section, we write We recall an important elementary result relating the Frobenius norm and the sin(Θ) distance (see Bhatia (1997)) for any pair of matrices: if two matrices Û , U ∈ R n×d each have orthonormal columns, then (Observe that the expectation bound stated in Theorem 3 immediately applies to sin Θ distances up to an additional absolute constant factor.)Now for W = arg inf W ∈O Û − U W F , then We now proceed to further quantify the estimation of R (i) via R(i) := V A (i) V .In particular, we will prove the following lemma.
Lemma 9.There exist sequences of matrices B, N ∈ R d×d depending on d, m, and n, such that with Proof.Letting E (i) := A (i) − P (i) , we obtain the expansion For any othogonal matrix W ∈ R d×d , the first term can be further expanded as Furthermore, let W = arg inf W ∈O d V −V W F .The transpose invariance of the Frobenius and spectral norms and Equations ( 15) and ( 16) guarantee that terms two and three on the right hand side above can be bounded above as On the other hand, the second term in Equation ( 18) can be further expanded as the sum of four matrices, namely The final term in the above expansion can be bounded via Observe that terms two and three above are simply the transpose of one another, hence it suffices to analyze the former, which satisfies the bound Recall that we have a high probability bound on E (i) in spectral norm (see Theorem 20 in Athreya et al. ( 2017)), namely P E (i) > 8 δ(P (i) ) log(n) ≤ n −2 .
Theorem 3 can be leveraged together with Markov's inequality to yield that there exists an absolute constant C > 0 such that for any t > 0, Thus, there exist (sequences of) matrices B, N ∈ R d×d depending on d, m, and n, such that for which Put H m = W BW + W NW ; we call H m a bias term.Recall that δ(P (i) ) = max v=1,...,n n u=1 (P (i) ) uv , and as such R (i) = V R (i) V = P (i) ≤ δ(P (i) ).
Setting t such that 1 t δ(P (i) ) log(n) d m ε + √ dε 2 → 0 as m, n → ∞ guarantees that the Frobenius norm of the bias term H m vanishes in probability as n, m → ∞.In particular, note that under the assumption that (δ(P (i) )) 1/2 ε = O(1), By Slutsky's Theorem, then, the central limit theorem will be complete once we show that the entries of V E (i) V T converge to a normal distribution as n → ∞.We address this in our remaining lemma.
Lemma 10.Define Under the assumptions of Theorem 7, for any k, l ∈ [d], it holds that

Figure 1 :
Figure 1: Graphical representation of Algorithm 1 for estimating the common invariant subspace V in a multilayer stochastic blockmodel (see Definition 3).

Figure 2 :
Figure 2: Power for rejecting the null hypothesis that two graphs have the same distribution as a function of the difference

Figure 3 :
Figure 3: Estimated latent positions obtained by MASE on the HNU1 data graphs with a 3-dimensional embedding.

Figure 4 :
Figure 4: Multidimensional scaling of the individual graph representations obtained by MASE for the HNU1 data.The plot shows the positions discovered by MDS for the graphs corresponding to 5 subjects of the data, showing that graphs corresponding to the same subject are more similar between each other.
a) If min k∈[K]  n k = Ω(n), then Assumption 2 holds for the common invariant subspace of a multilayer SBM with parameter Z.b)If in addition to a),min i∈[m] K v=1 B (i) uv (1 − B (i) uv ) = ω(1/n),then Assumption 3 also holds.The next theorem presents the entrywise asymptotic distribution of the estimated score matrices as the size of the graphs increases.The proof is given on the Appendix.Theorem 7. Suppose (A (1) , . . ., A (m) ) ∼ COSIE V ; R (1) , . . ., R (m) are a sample of adjacency matrices from the COSIE model such that the parameters satisfy min i∈[m] δ(P (i) ) = ω(log n), ε = O 1 max √ δ(P (i) , as well as the delocalization requirements given in Assumption 2 and 3. Then there exists a sequence of orthogonal matrices W ∈ O d such that the estimates of {R

Figure 5 :
Figure5: The top panel shows the distribution of the difference between the MASE estimates and true eigenvalues of a single graph.The graphs in the sample are distributed according to a two-block multilayer SBM with n = 300 vertices, block connection probability matrix B (i) = 0.3I + 0.111 T .The distributions appear to be gaussian, and as the sample size m increases, the mean of the distribution (red solid line) approaches to zero (blue dashed line).This phenomenon is also observed in the bottom panel, which shows that the estimation bias decreases with the sample size.

Figure 6 :
Figure6: The top panel shows the distribution of the difference between the MASE estimates and true eigenvalues of a single graph.The graphs in the sample are distributed according to a two-block multilayer SBM, block connection probability matrix B (i) = 0.3I + 0.111 T , and a fixed sample size m = 3 but different number of vertices n.The estimation bias remains positive even with a large graph size (bottom panel), and for any fixed n the mean of the distributions (red solid line) is away from zero (blue dashed line), but the distributions appear to be gaussian.

Figure 7 :
Figure7: Distance between the true and estimated invariant subspaces computed with MASE (scaled and unscaled ASE) and ASE of the mean of the graphs for a sample of graphs distributed according to a multilayer SBM with 3 communities.On the left panel, all graphs have the same expected matrix.The three methods perform almost the same, reducing the error as the sample size increases.On the right panel, the connection probabilities of the SBM are chosen uniformly at random for each graph.ASE on the average adjacency matrix performs poorly, while MASE still improves with a larger sample size.

Figure 8 :
Figure 8: Out-of-sample classification accuracy as a function of the difference between classes for different graph embedding methods.Graphs in the sample are distributed as a multilayer SBM with n = 256 vertices, two communities, four different connectivity matrices that correspond to the class labels, and a training and test samples with m = 40 graphs.As the separation between the classes increase, all methods improve performance, but MASE and OMNI show the most gains.

Figure 9 :
Figure 9: Average normalized mean squared error of estimating the expected adjacency matrix of a sample of graphs using different embedding methods.Graphs in the sample are distributed as a multilayer SBM with n = 256 vertices, two communities, four different classes of connectivity matrices, and a training and test samples with m = 40 graphs.MASE and JE are

Figure 10 :
Figure 10: Performance in community detection of a gaussian mixture clustering applied to a common set of vertex latent positions estimated with different methods, for a sample of m = 40 two-block multilayer SBM graphs, with n = 256 vertices,

Figure 12 :
Figure 12: Scree plot of the top singular values of the concatenated scaled (left) and unscaled (right) adjacency spectral embeddings of the HNU1 data graphs.The plots show the larged 100 singular values ordered by magnitude.In both scaled and unscaled methods, we identified an elbow at d = 15.

Figure 13 :
Figure13: Matrix of distances between the estimated score parameters.Each entry of the matrix corresponds to the distance between the MASE estimated score matrices of a pair of graphs from the HNU1 data, composed by 30 subjects with 10 graphs each.Distances between graphs of the same subject are usually smaller than distances within subjects.

Figure 14 :Figure 15 :
Figure14: Cross-validation error in classifying the subject labels of the HNU1 data.For each embedding estimated with a different method, a 1-NN classifier was trained using the distance between the individual graph embeddings, varying the number of embedding dimensions (left plot) which control the description length of the different embeddings (right plot).MASE and OMNI are the only methods that achieve perfect accuracy, but MASE requires much fewer parameters to achieve this.

Figure 16 :
Figure 16: Proportion of rejected test as a function of the significance level for pairwise equal distribution tests between pairs of graphs representing scans of the same subject (left) and different subjects (right).The black dotted line on the left figure corresponds to the identity, indicating the expected proportion of rejections assuming that the distribution of same-subject graphs is the same.The test constructed with MASE estimates is rejected more often than the OMNI test statistic.